Emulating Subjective Criteria in Corpus Validation
نویسندگان
چکیده
The use of speech in human-machine interaction is increasing as the computer interfaces are becoming more complex but also more useable. These interfaces make use of the information obtained from the user through the analysis of different modalities and show a specific answer by means of different media. The origin of the multimodal systems can be found in its precursor, the “Put-That-There” system (Bolt, 1980), an application operated by speech and gesture recognition. The use of speech as one of these modalities to get orders from users and to provide some oral information makes the human-machine communication more natural. There is a growing number of applications that use speech-to-text conversion and animated characters with speech synthesis. One way to improve the naturalness of these interfaces is the incorporation of the recognition of user’s emotional states (Campbell, 2000). This point generally requires the creation of speech databases showing authentic emotional content allowing robust analysis. Cowie, Douglas-Cowie & Cox (2005) present some databases showing an increase in multimodal databases, and Ververidis & Kotropoulos (2006) describe 64 databases and their application. When creating this kind of databases the main arising problem is the naturalness of the locutions, which directly depends on the method used in the recordings, assuming that they must be controlled without interfering the authenticity of the locutions. Campbell (2000) and Schröder (2004) propose four different sources for obtaining emotional speech, ordered from less control but more authenticity to more control but less authenticity: i) natural occurrences, ii) provocation of authentic emotions in laboratory conditions, iii) stimulated emotions by means of prepared texts, and iv) acted speech reading the same texts with different emotional states, usually performed by actors. On the one hand, corpora designed to synthesize emotional speech are based on studies centred on the listener, following the distinction made by Schröder (2004), because they model the speech parameters in order to transmit a specific emotion. On the other hand, emotion recognition implies studies centred on the speaker, because they are related to the speaker emotional state and the parameters of the speech. The validation of a corpus used for synthesis involves both kinds of studies: the former since it will be used for synthesis and the latter since recognition is needed to evaluate its content. The best validation system is the selection of the valid utterances1 of the corpus by human listeners. However, the big size of a corpus makes this process unaffordable.
منابع مشابه
Expressive Speech Corpus Validation by Mapping Subjective Perception to Automatic Classification Based on Prosody and Voice Quality
This paper presents the validation of the expressiveness of an acted corpus produced to be used in speech synthesis, as this kind of emotional speech can be rather lacking in authenticity. The goal is to obtain a system which is able to prune bad utterances from an expressiveness point of view. The results from a previous subjective test are used for the training of a multistage emotional ident...
متن کاملObjective and Subjective Evaluation of an Expressive Speech Corpus
This paper presents the validation of the expressive content of an acted oral corpus produced to be used in speech synthesis. Firstly, objective validation has been conducted by means of automatic emotion identification techniques using statistical features obtained from the prosodic parameters of speech. Secondly, a listening test has been performed with a subset of utterances. The relationshi...
متن کاملValidation of an Expressive Speech Corpus by Mapping Automatic Classification to Subjective Evaluation
This paper presents the validation of the expressive content of an acted corpus produced to be used in speech synthesis. The use of acted speech can be rather lacking in authenticity and therefore its expressiveness validation is required. The goal is to obtain an automatic classifier able to prune the bad utterances –with wrong expressiveness–. Firstly, a subjective test has been conducted wit...
متن کاملEvaluation and Improvement of Generic-Emulating DPA Attacks
At CT-RSA 2014, Whitnall, Oswald and Standaert gave the impossibility result that no generic DPA strategies (i.e., without any a priori knowledge about the leakage characteristics) can recover secret information from a physical device by considering an injective target function (e.g., AES and PRESENT S-boxes), and as a remedy, they proposed a slightly relaxed strategy “generic-emulating DPAs” f...
متن کاملOn Building Mixed Lingual Speech Synthesis Systems
Codemixing phenomenon where lexical items from one language are embedded in the utterance of anotheris relatively frequent in multilingual communities. However, TTS systems today are not fully capable of effectively handling such mixed content despite achieving high quality in the monolingual case. In this paper, we investigate various mechanisms for building mixed lingual systems which are bui...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009